Method and apparatus for data clustering including segmentation and boundary detection

ABSTRACT

A method and apparatus for clustering data, particularly regarding an image, that constructs a graph in which each node of the graph represents a pixel of the image, and every two nodes represent neighboring pixels associated by a coupling factor. Block pixels are selected with unselected neighboring pixels coupled with a selected block to form aggregates. The graph is coarsened recursively by performing iterated weighted aggregation to form larger blocks (aggregates) and obtain hierarchical decomposition of the image while forming a pyramid structure over the image. Saliency of segments is detected in the pyramid, and by computing recursively, a degree of attachment of every pixel to each of the blocks in the pyramid. The pyramid is scanned from coarse to fine starting at the level a segment is detected, to lower levels and rebuilding the pyramid before continuing to the next higher level. Relaxation sweeps sharpen the boundaries of a segment.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a method and apparatus for dataclustering, especially regarding images, utilizing a novel fast,multiscale algorithm. More particularly, the present invention relatesto a method and apparatus for image segmentation and boundary detection.

[0003] 2. Description of the Prior Art

[0004] Data clustering is the task of inferring properties of largeamounts of data. Clustering is obtained through a process ofunsupervised learning in which the data is split into clusters, whichreveal its inner structure. Methods are known which employ a large classof graph algorithms adapted to deal with the data clustering problem,and more particularly, the segmentation problem. The algorithms employedtypically construct a graph in which the nodes represent the pixels inthe image and arcs represent affinities (“couplings”) between nearbypixels. In these methods, the image is segmented by minimizing a costassociated with cutting the graph into sub-graphs. In the simplerversion, the cost is the sum of the affinities across the cut. Otherversions normalize this cost by dividing it by the overall area of thesegments or by a measure derived from the affinities between nodeswithin the segments. Normalizing the cost of a cut preventsover-segmentation of the image. Attaining a globally optimal solutionfor normalized-cuts measures is known to be NP-hard even for planargraphs. Some variations of normalized-cuts measures can be found inpolynomial time, but the runtime complexity of these methods is O(N² logN), where N denotes the number of pixels in the image. Therefore,approximation methods are used. The most common approximation methoduses spectral techniques to find a solution. These spectral methods areanalogous to finding the principal modes of certain physical systems.With these methods, and exploiting the sparseness of the graph, a cutcan be found in O(N^(3/2)).

SUMMARY OF THE INVENTION

[0005] The method and apparatus of the present invention have utilityfor a variety of applications. For example, the invention can be used incorrelating correspondence of images. That is, to show that image 1corresponds to image 2 and outline the exact correspondence. Also, theinvention can be used to construct and manipulate 3D images in similarways. The invention has application in the field of transferring imageson the web. Where one maintains a large database and retrieves from it aparticular image, it is possible to scan for a coarse representation ofthe image and then send or transmit the coarse representation of theimage only as the first step in retrieval. This has the obvious effectof reducing bandwidth requirements and of effecting the transaction insubstantially less time. Further, the invention has application inregistration problems where a camera is in motion, changing itsposition, and it is desired to compare and register two images obtainedfrom different positions or angles. The invention also has applicabilityin the fields of imaging, such as satellite images, electron microscopicimages and medical images, particularly as regards analysis of the imageto derive salient features towards their classification. The inventionprovides a decomposition algorithm that can learn to provide feedback.In essence, the image in electronic form is decomposed and representedin a coarser form in a manner that contains the essential informationregarding its properties, including shape, boundaries, statistics,texture, color, differences, intensities and any other attribute orproperty of the image that can be measured.

[0006] In accordance with the present invention, a method and apparatusis provided that in a first embodiment employs a novel and unique fastgraph algorithm for clustering that finds an approximate solution tonormalized cut measures and whose runtime is linear in the number ofedges in the graph. By the practice of the present invention, in justone pass the algorithm provides a complete hierarchical decomposition ofthe graph into clusters. The novelty and uniqueness of the presentinvention is demonstrated by applying it to the case of imagesegmentation. Image segmentation is a process of grouping togetherneighboring pixels whose properties (e.g., intensity values) arecoherent. The resulting regions may indicate the presence of objects orparts of objects, and may be verified (or modified) later following atop-down analysis of the image and recognition. One principal advantageof the present invention is that the novel algorithm employed forsegmentation has been constructed efficiently so that it can faithfullyextract regions of different sizes from an image.

[0007] Like the prior art the method and apparatus of the presentinvention employs the unique algorithm to find an approximate solutionto a normalized cut problem, but the present invention distinguishesfrom the known art by doing so in time that is linear in the number ofpixels in the image with only a few dozen operations per pixel. Since atypical image may contain several hundreds of thousands of pixels, thefactor {square root}{square root over (N)} gained may be quitesignificant. The algorithm is based on representing the sameminimization problem at different scales, enabling fast extraction ofthe segments that minimize the optimization criterion. Because of itsmultiscale nature, the algorithm provides a full hierarchicaldecomposition of the image into segments in just one pass. In addition,it allows modifying the optimization criterion with scale enablingincorporation of higher order statistics of the segments when their sizeis sufficiently large to allow reliable extraction of such statistics.The algorithm relates to the same physical systems whose modes are foundby the spectral methods, but uses modern numeric techniques that providea fast and accurate solution to these problems. The results of runningthe novel algorithm on a variety of images present a significantimprovement over the results obtained by the spectral methods.

[0008] In the method and apparatus of the present, the novel algorithmproceeds as follows. Given an image, first a graph is constructed sothat every pixel is a node in the graph and neighboring pixels areconnected by an arc. A weight is associated with the arc reflecting thelikelihood that the corresponding pixels are separated by an edge. Tofind the minimal cuts in the graph, the graph is recursively coarsenedusing a unique weighted aggregation procedure in which repeatedlysmaller sets of representative pixels (blocks) are selected. Theserepresentative pixels do not have to lie on a regular grid, giving riseto an irregular pyramid. The purpose of these coarsening steps is toproduce smaller and smaller graphs that faithfully represent the sameminimization problem. In the course of this process segments that aredistinct from their environment emerge and they are detected at theirappropriate size scale. After constructing the entire pyramid, thepyramid is scanned from the top down performing relaxation sweeps toassociate each pixel with the appropriate segment.

[0009] In the simple version of the unique algorithm, the couplingsbetween block pixels at a coarse level are computed directly from thecouplings between finer level pixels. In a variation of this algorithm,the couplings are modified between block pixels to reflect certainglobal statistics of each block. These statistics can be computedrecursively throughout the coarsening process and may include theaverage intensity level of the blocks, the position of their center,their principal orientation, their area, texture measurements, etc. Thisenables, for example, identification of large segments even if theintensity levels separating them vary gradually.

[0010] Although pyramidal structures have been used in many algorithmsfor segmentation, such methods that use regular pyramids havedifficulties in extracting regions of irregular structures. Prior knownmethods that construct irregular pyramids are strongly affected by localdecisions. Fuzzy C-means clustering algorithms avoid such prematuredecisions, but they involve a slow iterative process. Also related as ofgeneral interest are algorithms motivated by physical processes.

[0011] The method and apparatus of the present invention employs analgorithm that uses modern numeric techniques to find an approximatesolution to normalized cut measures in time that is linear in the sizeof the image (or more generally, in the number of edges in theclustering graph) with only a few dozen operations per pixel. In justone pass the algorithm provides a complete hierarchical decomposition ofthe image into segments. The algorithm detects the segments by applyinga process of recursive coarsening in which the same minimization problemis represented with fewer and fewer variables producing an irregularpyramid. During this coarsening process the method can computeadditional internal statistics of the emerging segments and use thesestatistics to facilitate the segmentation process. Once the pyramid iscompleted it is scanned from the top down to associate pixels close tothe boundaries of segments with the appropriate segment. The efficacy ofthe method is demonstrated by applying it to real images.

[0012] In a further embodiment of the invention, an improved method andapparatus is described that utilizes the segmentation by weightedaggregation (SWA) noted above and provides improvements that enable theinventive method and apparatus to achieve a superior result. Thisimprovement will be described in detail hereinafter, and the advantagesthat accrue will become more evident.

[0013] Other objects and advantages of the present invention will becomemore readily apparent from the following detailed description of themethod and apparatus of the present invention when taken in conjunctionwith the appended drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014]FIG. 1a shows an input image; FIG. 1b shows the picture wasdivided into two segments extracted by the method and apparatus of thepresent invention at a scale 11 (the boundaries of the segments beinghighlighted); FIG. 1c shows at scale 8 five segments stood out, twocapturing most of the bodies of the two players, one captures the handof one of the players, and one captures the head of the other; and FIG.1d shows at scale 7 smaller segments are obtained, separating some ofthe body parts of the two players.

[0015]FIG. 2a shows an input image; FIG. 2b shows the input imagedecomposed at level 10 into four segments, one of which captures thelioness depicted; FIG. 2c shows the bottom segment further decomposed atlevel 8 into three segments, splitting the cub and stone from theground.

[0016]FIG. 3a shows an input image; FIG. 3b shows the image decomposedinto three segments obtained at scale 10, capturing the skies, thegrass, and a single segment that includes the cow and the hillybackground; FIG. 3c shows at scale 9 the cow separated from the hills,and the grass split into two segments.

[0017]FIG. 4a shows an input image; FIG. 4b shows at the coarsest scalethe grass separated from the cows (except for the bright back of the cowwhich was decomposed later from the grass); FIG. 3c shows the three cowsthen split (with the rightmost cow split into two segments), the bodyparts of the cows obtained in the lower scale.

[0018]FIGS. 5a to d show in FIG. 5 the effect of average intensities of10×10 pixel squares, in FIG. 5b their bilinear interpolation, in FIG. 5caverage intensities of 73 pixel aggregates and in FIG. 5d theirinterpolations, the original image is shown in FIG. 6a.

[0019]FIGS. 6a to d show in FIG. 6a the original input image, in FIG. 6bthe result of the method of the present invention, in FIG. 6c the resultof the application of the SWA and in FIG. 6d the result of the prior art(the so called “normalized-cuts”—spectral method).

[0020]FIGS. 7a to d show in FIG. 7a an original input image, in FIG. 7bthe result of the application of the present invention, in FIG. 7c theapplication of the SWA and in FIG. 7d the result of the prior art (theso called “normalized-cuts”—spectral method).

[0021]FIGS. 8a to d show in FIG. 8a an input image, in FIG. 8b theresult of the application of the present invention, in FIG. 8c theapplication of the SWA and in FIG. 8d the result of the prior art (theso called “normalized-cuts”—spectral method).

[0022]FIGS. 9a to c show in FIG. 9a an input image, in FIG. 9b theresult of the application of the present invention and in FIG. 9c theresult of the prior art (the so called “normalized-cuts”—spectralmethod).

[0023]FIGS. 10a and b show in FIG. 10a an input image, in FIG. 10b theresult of the application of the present invention.

[0024]FIGS. 11a to c show in FIG. 11a an input image, in FIG. 11b theresult of the application of the present invention and in FIG. 11c theresult of the prior art (the so called “normalized-cuts”—spectralmethod).

[0025]FIGS. 12a and b show in FIG. 12a an input image and in FIG. 12bthe result of the application of the present invention.

[0026]FIG. 13 shows a generalized flow chart of the method of thepresent invention.

[0027]FIG. 14 is a block diagram of an exemplary computer system usefulfor implementing the present invention.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION

[0028] The inventive method and apparatus will now be described indetail with respect to specific embodiments. In the first embodiment,given an image Ω that contains N=n² pixels, a graph is constructed inwhich each node represents a pixel and every two nodes representingneighboring pixels are connected by an arc. In the implementation of themethod each node is connected to the four neighbors of the respectivepixel, producing a planar graph. It should be noted, however, that theinventive method can be applied also to non-planar graphs. In fact, thegraphs obtained following the coarsening steps to be describedhereinafter are non-planar. In the discussion below, a pixel is denotedby an index iε{1, 2, . . . N} and its intensity by g_(i). To every arcconnecting two neighboring pixels i and j, a positive coupling valuea_(ij) is assigned reflecting the degree to which they tend to belong tothe same segment. For example, a_(ij) could be a decreasing function of|g_(i)-g_(j)|. In the implementation, local responses are used to edgefilters to determine the couplings between elements (see below).

[0029] To detect the segments, associated with the graph is a statevector u=(u₁, u₂, . . . , u_(N)) where u_(i)εR is a state variableassociated with pixel i. We define a segment S^((m)) as a collection ofpixels, S_(m) = {i_(m₁), i_(m₂),  …  , i_(m_(n_(m)))}

[0030] and associate with it a state vector u^((m))=(u₁ ^((m)), u₂^((m)), . . . , u_(N) ^((m))), in which $\begin{matrix}{u_{i}^{(m)} = \left\{ \begin{matrix}1 & {{{if}\quad i} \in S_{m}} \\0 & {{{if}\quad i} \notin S_{m}}\end{matrix} \right.} & (1)\end{matrix}$

[0031] In practice, the state variables are allowed to take non-binaryvalues. In particular, it is expected that pixels near fuzzy sections ofthe boundaries of a segment may have intermediate values 0<u_(i)^((m))<1 reflecting their relative tendency to belong to either thesegment or its complement.

[0032] Next, is defined an energy functional to rank the segments.Consider first the functional $\begin{matrix}{{{E(u)} = {\sum\limits_{\langle{i,j}\rangle}^{\quad}{a_{ij}\left( {u_{i} - u_{j}} \right)}^{2}}},} & (2)\end{matrix}$

[0033] where the sum is over all pairs of adjacent pixels i and j.Clearly, for an ideal segment (with only binary state variables)E(u^((m))) sums the coupling values along the boundaries of S_(m). Withsuch a cost function small segments (and similarly very large ones) areoften encouraged. To avoid such preference-this energy can be modifiedas follows:

Γ(u)=E(u)/V ^(α)(u),  (3)

[0034] where V(u) denotes the “volume” of the respective segment,V(u)=Σ_(i)u_(i) and α is some predetermined parameter. Thus, forexample, V(u^((m))) will measure the area in pixels of S_(m). To avoidselecting very large segments we consider only segments whose totalvolume is less than half of the entire image. This is equivalent todefining the volume as min{V(u),N−V(u)}. Alternatively, the volume canbe replaced by the product V(u)(N−V(u)). This and similar modificationsof Γ(u) can too be incorporated in the fast algorithm.

[0035] Note that setting α=0.5 will eliminate size preference sinceΓ(u^((m))) in this case is roughly the average of the couplings alongthe boundary of S_(m). E(u^((m))) is the sum of the couplings along theboundary of S_(m), and $\sqrt{V\left( u^{(m)} \right)}$

[0036] is roughly proportional to the perimeter of S_(m). In contrast,setting α>0.5 will create preference for large segments. In theimplementation used α=1, which is equivalent to the so-called “average”or “normalized” cut measures.

[0037] Finally, the volume of u can be generalized by replacing V(u) by$\begin{matrix}{{{V_{\varphi}(u)} = {\sum\limits_{i = 1}^{N}{\varphi_{i}u_{i}}}},{{\sum\limits_{i = 1}^{N}\varphi_{i}} = N},} & (4)\end{matrix}$

[0038] where φ_(i) is a “mass” assigned to the pixel i. This will becomeimportant in coarser steps when nodes may draw their mass from sets ofpixels of different size. Also, in the finest scale we may assign lowervolumes to pixels at “less interesting” or “less reliable” parts of theimage, e.g., along its margins.

[0039] The inventive method and apparatus includes recursivestep-by-step coarsening of the segmentation. In each coarsening step anew, approximately equivalent segmentation will be defined, reducing thenumber of state variables to a fraction (typically between ¼ and ½) ofthe former number. The coarsening is constructed such that each of thecoarse variables represent several fine variables with differentweights, and every fine variable will be represented by several coarsevariables with different weights. The low-energy configurations of thecoarse problem will reflect the low-energy configurations of the fineproblem.

[0040] Below is described the first coarsening step. The state variablesin the coarser problem can be thought of as the values (ideally 0 or 1)of a diluted set of pixels, i.e., a subset C of the original set ofpixels. The values u_(i) associated with the rest of the pixels (i∉C)will be determined from the coarse state variables using pre-assigneddependence rules. These rules will define Γ(u) as a functional of thesmaller set of variables C, i.e., Γ^(c)({u_(i)}_(iεC)), and thedependence rules are selected so that the detection of segments withsmall Γ^(c) (in the coarser problem) would lead to segments with small Γ(in the fine problem).

[0041] Generally, for any chosen subset of indicesC   • {c_(k)}_(k = 1)^(K) ⋐ {1, 2, …  , N},

[0042] denote u_(c) _(k) as U_(k), dependence rules are chosen of theform of a weighted interpolation rule: $\begin{matrix}{{u_{i} = {\sum\limits_{k = 1}^{K}{w_{ik}U_{k}}}},} & (5)\end{matrix}$

[0043] where w_(ik)≧0,${{\overset{K}{\sum\limits_{k = 1}}w_{ik}} = 1},$

[0044] and for i=c_(k)εC, w_(ik)=1. Only local interpolation rules areconsidered, i.e., w_(ik)=0 for all pixels c_(k) not in the neighborhoodof pixel i. The values of w_(ik) will be determined by the couplingvalues only, and will not depend on the values of the state variables(see below).

[0045] Substituting equation 5 (weighted interpolation) into equation 2(fine energy), the result is $\begin{matrix}{{{{E^{c}(U)}\bullet \quad {E(u)}} = {\sum\limits_{k,l}^{\quad}{A_{kl}\left( {U_{k} - U_{l}} \right)}^{2}}},} & (6)\end{matrix}$

[0046] where the couplings A_(kl) between the coarse-level variables aregiven by $\begin{matrix}{A_{kl} = {\sum\limits_{i \neq j}^{\quad}{{a_{ij}\left( {w_{jl} - w_{il}} \right)}{\left( {w_{ik} - w_{jk}} \right).}}}} & (7)\end{matrix}$

[0047] In addition, substituting equation 5 (weighted interpolation)into equation 4 (fine volume), the result is $\begin{matrix}{{{V^{c}(U)}\bullet \quad {V_{\varphi}(u)}} = {{V_{\Phi}(U)} = {\sum\limits_{k = 1}^{K}{\Phi_{k}U_{k}}}}} & (8)\end{matrix}$

[0048] where $\begin{matrix}{{\Phi_{k} = {\sum\limits_{i}\quad {\varphi_{i}w_{ik}}}},{k = 1},\ldots \quad,{K.}} & (9)\end{matrix}$

[0049] Thus, the dependence rules of equation 5 (weighted interpolation)yields

Γ^(c)(U)□Γ(u)=E ^(c)(U)/[V ^(c)(U)]⁶⁰ .  (10)

[0050] The set C itself will be chosen in such a way that each pixel i∉Cis strongly coupled to pixels in C. By this is meant roughly that$\begin{matrix}{{{\sum\limits_{c_{k} \in C}\quad a_{{ic}_{k}}} \geq {\beta {\sum\limits_{j}a_{ij}}}},} & (11)\end{matrix}$

[0051] where β is a control parameter. This choice will ensure that forany low-energy configurations the values of u indeed depend, to a goodapproximation, on those of the subset U. This choice of C is common inapplying fast, multiscale AMG solvers.

[0052] Now the interpolation rule in equation 5 (weighted interpolation)will be discussed. Given a segment S_(m), U^((m)) is defined as$\begin{matrix}{U_{k}^{(m)} = \left\{ {\begin{matrix}1 & {if} & {c_{k} \in S_{m}} \\0 & {if} & {c_{k} \notin S_{m}}\end{matrix},} \right.} & (12)\end{matrix}$

[0053] and define ũ^((m)) as the configuration interpolated from U^((m))by using Eq. 5. That is, $\begin{matrix}{{\overset{\sim}{u}}_{i}^{(m)} = {\sum\limits_{k = 1}^{K}\quad {w_{ik}{U_{k}^{(m)}.}}}} & (13)\end{matrix}$

[0054] Note that E^(c)(U^((m)))=E(ũ^((m))), V^(c)(U^((m)))=V(ũ^((m))),and hence Γ^(c)(U^((m)))=Γ(ũ^((m))). A proper interpolation rule shouldsatisfy the condition that for every S_(m), Γ^(c)(U^((m)))=Γ(ũ^((m))) issmall if and only if Γ(u^((m))) is small.

[0055] One possible interpolation rule could be that a state variableũ_(i) for i∉C would inherit its state from the coarse state variableU_(k) to which it is most strongly attached (in other words, ũ_(i)=U_(k)such that a_(ik) is maximal). This rule, however, may lead to mistakesin assigning the correct state to the interpolated variables due tonearby outliers, which in turn may result in a noticeable increase inthe energy E(ũ^((m))) associated with the segment. Consequently, theminimization problem with the coarse variables will poorly approximatethe minimization problem with the fine variables.

[0056] According to the invention, the interpolation weights are set asfollows: $\begin{matrix}{{w_{ik} = \frac{a_{{ic}_{k}}}{\sum\limits_{l = 1}^{K}a_{{ic}_{l}}}},{\forall{i \notin C}},{c_{k} \in {C.}}} & (14)\end{matrix}$

[0057] These settings are commonly used by the AMG minimizer. (For adefinition of weights that leads to an even more preciseinterpolation—see below.) With this interpolation rule the state of avariable ũ_(i), i∉C, is determined by several nearby coarse pixels withpixels coupled more strongly affecting its value more.

[0058] It is straightforward to verify that boundary sections of asegment across which intensity variations are sharp contribute verylittle to the energy associated with the segment, whereas sections ofthe boundary across which intensity is varying gradually contribute mostof the energy of the segment. It can be shown further that when theproblem is coarsened the contribution of such sections in generaldecreases by about half. Since the volume of a segment is roughlypreserved when the problem is coarsened, for such a segment S_(m) isobtained that is distinct from its surroundingΓ^(c)(U^((m)))≈Γ(u^((m)))≈0, whereas for a segment S_(m) that is notstrongly decoupled along its boundaries Γ^(c)(U^((m)))≈½Γ(u^((m))).Thus, under the weighted interpolation of equation 14, the problem offinding all segments S_(m) for which Γ(u^((m))) is below a certainthreshold is equivalent approximately to the smaller, coarse problem offinding all S_(m) for which Γ^(c)(U^((m))) is below half the samethreshold.

[0059] Note that the resulting coarse problem is exactly of the sameform as the original problem, and hence it can in turn be reduced usingthe same procedure to an equivalent, yet coarser problem of the sameform. This recursive coarsening process is terminated when the number ofvariables is sufficiently small so that the problem can be solveddirectly for the coarsest grid.

[0060] There is one case in which a state variable cannot beapproximated accurately by the state variables of its neighbors. Thishappens when a salient segment S_(m) coincides at some scale with asingle pixel i; i.e., u_(i) ^((m))=1 while u_(j) ^((m))=0 for j≠i.(This, of course, would not happen usually at the original, finestlevel, but at coarser levels of the algorithm, where “pixels” are nolonger original image pixels.) Consequently, if i∉C, then the segmentwill no longer be represented at the coarser levels. But it is exactlyat this point of the coarsening process that one can detect thatΓ(u^((m))) is small, and hence, identify the salient S_(m) in itsnatural size scale.

[0061] According to the invention the natural and useful way tointerpret each coarsening step is as an aggregation step. In that viewone chooses small aggregates of pixels, in terms of which theminimization problem can be reformulated with a substantially smallernumber of variables. That is, enumerating the aggregates 1, 2, . . . ,K, and associating with the k-th aggregate a “block variable” U_(k), andderiving from the original minimization problem a minimization problemin terms of U₁, . . . , U_(k), The coarse variables, in fact, do nothave to be identified each with a particular pixel, during coarsening,but instead, they can be identified with weighted averages of pixels.

[0062] The interpolation rule that relates the coarse to the finepixels, equation 5 (weighted interpolation) and equation 14 (weightedaverage coef.) leads to a process of weighted aggregation, in which afraction w_(ik) of a pixel i can be sent into the aggregate k. Thisfraction may be interpreted as the likelihood of the pixel i to belongto the aggregate k. These likelihoods will then accumulate and reinforceeach other at each further coarsening step.

[0063] The choice of the coarser aggregates and the nature of thiscoarsening process is such that strongly coupled aggregates jointogether to form yet coarser aggregates. A set of pixels with stronginternal couplings but with weak external couplings is bound to resultat some level of coarsening in one aggregate which is weakly coupled toall other aggregates of that level. Such an aggregate will indicate theexistence of an image segment.

[0064] The “coarse couplings” relations (Equation 7) can be somewhatsimplified, yielding a similar coarsening process, named IteratedWeighted Aggregation (IWA). IWA consists of exactly the same steps asthe AMG coarsening, except that the coarse couplings {A_(kl)} arecalculated by the simpler formula $\begin{matrix}{A_{kl} = {\sum\limits_{i \neq j}\quad {w_{ik}a_{ij}{w_{jl}.}}}} & (15)\end{matrix}$

[0065] It can be shown that equation 15, coarse couplings (IWA) in manysituations provides a good approximation to equation 7, coarse couplings(A). In certain cases the two processes are identical, e.g., in the casethat each pixel is associated with only two blocks. Moreover, equation15 (IWA) can be motivated by itself; it states that the coupling betweentwo blocks is the sum of the couplings between the pixels associatedwith these blocks weighted appropriately.

[0066] Based on the foregoing concepts, the present invention comprisesof a segmentation algorithm that is composed of two stages. In the firststage salient segments are detected and in the second stage the exactboundaries of the segments are determined. The rest of this sectiondescribes the two stages.

[0067] Referring to the first stage, detecting the Salient Segments, themethod is described starting from a given image. Initially each pixel isconsidered to be a node connected to its four immediate neighbors. Then,coupling values are assigned between each pair of neighbors. Thecoupling values a_(ij) are set to be a_(ij)=exp(−μr_(ij)), where μ is aglobal parameter, and r_(ij) is an “edgeness” measure between i and j.Specifically, for horizontally spaced neighbors i and j the presence ofan edge was tested in five orientations at the angular range −45°≦θ≦45°about the vertical direction, each by differentiating two 3×1 maskswhose centers are placed on i and j. Then r_(ij) is taken to be themaximal of the five responses.

[0068] Next, this graph is coarsened by performing iterated weightedaggregation. At each step of the coarsening block pixels are firstselected and then the couplings between the blocks updated.Subsequently, a pyramidal structure is obtained that makes the optimalsegments explicit.

[0069] In order to select the block pixels, first, the nodes (pixels)are ordered by the volume they represent. The nodes are sorted bybucketing to maintain linear runtime complexity. The first pixel isselected to be a block. Then, pixels are scanned according to this orderand checked with respect to their degree of attachment, each to thepreviously selected blocks. Whenever a pixel is encountered that isweakly attached to the selected blocks, that pixel is added to the listof blocks.

[0070] Specifically, let C^((i−1)) denote the set of blocks selectedbefore a pixel i is tested. The inequality is checked by the following$\begin{matrix}{{{\max\limits_{j \in C^{({i - l})}}\quad a_{ij}} \geq {\overset{\sim}{a}{\sum\limits_{l}a_{il}}}},} & (16)\end{matrix}$

[0071] Note Where {tilde over (α)} is a parameter (typically {tilde over(α)}=0.1). Note that since generally a node is connected to a smallnumber of neighbors it must be coupled strongly to at least one of itsneighbors. In case the inequality is satisfied, C^((i))=C^((i−1)) isset, otherwise C^((i))=C^((i−1))U{i} is set. As a result of this processalmost every pixel i∉C becomes strongly coupled to the pixels in C. Thefew remaining pixels are then added to C.

[0072] Next segmentation is effected. The couplings between the blocksare updated using equation 15 (coarse couplings, IWA), where the weightsw_(ik) are defined by equation 14 (weighted average coef.) In addition,the volume φ_(k) of each block is computed at this level using equation9 (coarse Phi). Next, it is determined if a block represents a salientsegment. The saliency of a segment is given by the ratio between the sumof its external couplings and its volume. When computing the saliency ofa block, however, one needs to take into account that every coarseningstep diminishes the external couplings of the segment by about a half.One can compensate for this reduction by multiplying this ratio by 2 tothe power of the level number. Thus, the saliency of a block k becomes${{\Gamma \left( U_{k} \right)} = {\frac{\sum A_{kl}}{\Phi_{k}^{\alpha}}2^{\sigma}}},$

[0073] where σ denotes the scale. Alternatively, one can use the volumeof the block as a measure of scale, in which case one obtains${{\Gamma \left( U_{k} \right)} = \frac{\sum A_{kl}}{\Phi_{k}^{\alpha - y}}},$

[0074] where γ can be set between 0.5 to 1 according to the ratio ofpixels that survive each coarsening step (0.25 to 0.5 respectively). Inthe implementation, the blocks of the same scale are simply compared andthe ones whose saliency values are very low are detected. Then theseblocks are allowed to participate in forming larger blocks to obtain ahierarchical decomposition of the image into segments.

[0075] There follows the technique for sharpening segment boundaries,During the first steps of the algorithm a salient segment is detected asa single element at some level of the pyramid. It remains then todetermine exactly which pixels of the original image (at the finestlevel) in fact belong to that segment. One way to determine which pixelsbelong to a segment is to compute recursively the degree of attachmentof every pixel to each of the blocks in the pyramid. Unfortunately, thedegrees of attachment computed this way will often produce “fuzzy”values, between 0 to 1, particularly near the boundaries of a segment,rendering the decision of the extent of a segment somewhat arbitrary. Toavoid this fuzziness the pyramid is scanned from coarse to fine startingat the level in which a segment is detected and relaxation sweepsapplied whose intent is to sharpen the boundaries of a segment. In thefollowing, one example of this step of the algorithm is described.

[0076] Suppose a segment S_(m) has been detected, and suppose that at acertain level (which is called now the “coarse-level”) it has beenalready determined which pixels belong to S_(m). Now it is shown how todetermine at the next finer level (called now the “fine level”) whichpixels belong to S_(m). Using the same notation as before, the coarselevel variables, {U_(k)^((m))}_(k = 1)^(K),

[0077] satisfy equation 12. Actually, along the boundaries of S_(m) someU_(k) ^((m))'s may assume values between 0 and 1. The task is todetermine which pixels {u_(j)^((m))}_(j = 1)^(N)

[0078] satisfy equation 1, but again allowing only pixels along theboundaries to obtain intermediate values between 0 and 1. Guided by theprinciple of minimizing Γ(u^((m))), a sharpening cycle consists of thefollowing steps, iteratively changing ũ^((m)).

[0079] By fixing two parameters 0<δ₁<δ₂<1 and defining D_(x,y) to be theset of all pixels i such that x<ũ_(i) ^((m))<y at the beginning of thecycle. Then, modifying ũ^((m)) by setting ũ_(i) ^((m))=0 for iεD_(0,δ) ₁, setting ũ_(i) ^((m))=1 for iεD_(δ) ₂ _(,1), and leaving ũ_(i) ^((m))unchanged for iεD_(δ) ₁ _(,δ) ₂ . This is followed by applying v“Gauss-Seidel relaxation sweeps” over D_(δ) ₁ _(,δ) ₂ , where v isanother free parameter. Each such “relaxation sweep” is a sequence ofsteps aimed at lowering E(ũ^((m))). In each sweep we go over all thepixels in D_(δ) ₁ _(,δ) ₂ , in any order. For each pixel i, ũ_(i) ^((m))is replaced by the new value Σ_(j)a_(ij)ũ_(j) ^((m))/(Σ_(j)a_(ij)), thatis the value for which E(ũ^((m))) is lowered the most. Since the volumeV(ũ^((m))) is only marginally affected also Γ(ũ^((m))) is lowered. Sincein the beginning of this procedure already only pixels around theboundaries have fuzzy values (because this procedure has been applied tothe coarser level) this relaxation procedure converges quickly. Hence, asmall number of sweeps, v will generally suffice. In experiments tworelaxation sweeps were applied in every level with, e.g., δ₁=1−δ₂=0.15in the first cycle and δ₁=1−δ₂=0.3 in the second cycle. The final ũ_(i)^((m)) is defined as the desired vector u^((m)).

[0080] In the steps of the algorithm described above the couplings atall levels are derived directly from the couplings between the pixels atthe finest level. However, since each element at a coarse levelrepresents an aggregate of pixels, information about the emergingsegments may be used that is not directly available at the finest levelto facilitate the segmentation process. Thus “observables” can bemeasured at the coarse levels, and used to increase or decrease thecouplings between blocks obtained with the original algorithm. Anexample, for such an observable is the average intensity of a block,which can be used to separate segments even when the transition betweentheir intensity values is gradual, and so they are difficult to separateat the finest levels. The average intensity G_(k) of a block k in theabove coarsening step is defined as

G _(k)=Σ_(i) w _(ik) g _(i)/Σ_(i) w _(ik),

[0081] where g_(i) denotes the intensity of pixel i; This observable canbe calculated recursively at all coarser levels. Then, the couplingsA_(kl) computed by equation 15 (coarse couplings IWA) may be replaced,e.g., by A_(kl)exp(−μ|G_(k)-G_(l)|), where μ is a predeterminedconstant.

[0082] The number of observables per aggregate can increase at coarserlevels. Other possible observables include the center of mass of ablock, its diameter, principal orientations, texture measures, etc.Using these observables it is possible to incorporate quite elaboratecriteria into the segmentation process. For example, strong couplingscan be assigned between two aggregates whose orientations align with thedirection of the line connecting their centers of mass (or when theirboundaries co-align), even when these aggregates are separated by a gapand thus do not inherit any mutual couplings from finer levels.

[0083] Similarly (and perhaps more important), very strong couplingscould be attributed between two horizontally neighbors aggregates whose“upper-weighted-average” boundary directions coincide. Using the aboveboundary sharpening procedure on any large-scale aggregate with properparameters the boundary direction of any part of the boundary will comeout well defined. For each of the two aggregates, in theupper-weighted-average of these directions, larger weights will beassigned in regions closer to the upper side of the other aggregate,except that zero weights will used along their “common” boundary(properly defined).

[0084] At every coarsening step a subset of the nodes is selected suchthat the remaining nodes are coupled strongly to at least one of thenodes. Following this selection procedure almost no two neighboringnodes can survive to the next level. Thus, at every level of scale abouthalf the nodes are obtained from the previous level. The total number ofnodes in all levels, therefore, is about twice the number of pixels.

[0085] During the selection procedure there are two operations whosenaive implementation may result in a nonlinear complexity. First, thenodes need to be ordered, say, according to their volumes. This can bedone in linear time by dividing the range of possible volumes into afixed number of buckets since it is unnecessary to sort nodes whosevolumes are similar. Furthermore, in the first few levels the nodesusually have similar volumes, and so this ordering do not need to beapplied. Instead, merely scanning the nodes in some arbitrary order canbe done. Secondly, for every node its maximal connection to the selectedblocks must be found, see equation 16 (coarse item max dependence). Thisoperation can be implemented efficiently by noticing that every nodeneed only to consider its neighboring nodes, typically up to 8 nodes.Finally, computing the degree of attachment of the pixels to all theblock variables can be done in one pass once the pyramid is complete.

[0086] The number of operations per pixel can be reduced significantlyby replacing the first 1-3 coarsening steps by equivalent geometriccoarsening. In these coarsening steps the same operations are performed,but the pixels selected as blocks are determined in advance to lie alonga regular grid of twice the mesh size. (This may require adding some ofthe fine pixels to the coarse set to avoid inaccurate interpolations.)With this modification it is possible to reduce the execution time ofthe algorithm to only several dozen operations per pixel.

[0087] Examples of segmentation obtained using the method and apparatusof the present invention including implementation of the algorithm aredescribed in the following. The implementation was not optimized, and,for example, did not include geometric coarsening to reduce the numberof operations per pixels. The implementation (written in C and run on anIntel 400 MHz Pentium II processor) took 60 seconds to segment a 200×200image. The pyramid produced in this run contained about 73000 nodes(less than twice the number of pixels.) Segmenting a 100×100 image tookonly 12 seconds.

[0088] The pictures illustrated in the Figures of the drawingsdemonstrate the application of method and apparatus of the firstembodiment of the invention including the algorithm to several realimages. FIGS. 1a to 1 d show baseball players in action. The input imageis FIG. 1a. At the top most scale, FIG. 1b, the picture was divided intotwo segments. At scale 8, FIG. 1c, five segments stood out, twocapturing most of the bodies of the two players, one captures the handof one of the players, and one captures the head of the other. At scale7, FIG. 1d, smaller segments are obtained, separating some of the bodyparts of the two players.

[0089]FIGS. 2a to 2 c show a lioness, original image FIG. 2a, that wasdecomposed at level 10, FIG. 2b, into four segments, one of whichcaptures the lioness. At level 8, FIG. 2c, the bottom segment wasfurther decomposed into three segments, splitting the cub and the stonefrom the ground. FIGS. 3a to 3 c show a cow standing in a field. FIG. 3ais the input image. FIG. 3b shows the three segments obtained at scale10, capturing the skies, the grass, and a single segment that includesthe cow and the hilly background. At scale 9, FIG. 3c, the cow wasseparated from the hills, and the grass was split into two segments.FIGS. 4a to 4 c show three cows (FIG. 4a being the original inputimage). At the coarsest scale, FIG. 4b, the grass was separated from thecows (except for the bright back of the rightmost cow which wasdecomposed later from the grass). The three cows were then split, FIG.4c, (with the rightmost cow split into two segments). Body parts of thecows are obtained in the lower scale, FIG. 4d. Overall, these picturesdemonstrate that the invention including the algorithm accurately findsthe relevant regions in the images.

[0090] As described above, the method and apparatus of the inventionprovides a way to deal with data clustering and more particularly,provides a fast, multiscale algorithm for image segmentation. Thealgorithm uses a process of recursive weighted aggregation to detect thedistinctive segments at different scales. It finds an approximatesolution to normalized cuts measure in time that is linear in the sizeof the image with only a few dozen operations per pixel. Theinterpolation weights, see equation 14 can be improved, yielding abetter approximation of the fine level minimization problem by thecoarser representations and allowing representing the coarser problemswith fewer block pixels.

[0091] Ideally, the interpolation rule equation 5, should yield afine-level configuration u that satisfies the energy-minimizationcondition ∂E(u)/∂u_(i)=0. Since E is quadratic in u this condition canbe written as $\begin{matrix}{{u_{i}^{(m)} = {{\sum\limits_{j \in C}^{\quad}{{\hat{a}}_{ij}u_{j}^{(m)}}} + {\sum\limits_{j \notin C}^{\quad}{{\hat{a}}_{ij}u_{j}^{(m)}}}}},} & (17)\end{matrix}$

[0092] where â_(ij) are the normalized couplings, defined byâ_(ij) = a_(ij)/∑_(l)a_(il).

[0093] Notice that the interpolation rule considers only the first termsin equation 17. Given any non-ideal interpolation weights {w_(ik)},improved interpolation weights {{overscore (w)}_(ik)} are given by$\begin{matrix}{{\overset{\_}{w}}_{ik} = {{\hat{a}}_{{ic}_{k}} + {\sum\limits_{j \notin C}^{\quad}{{\hat{a}}_{ij}{w_{jk}.}}}}} & (18)\end{matrix}$

[0094] This same rule can recursively be reused several time, to createincreasingly improved interpolation weights.

[0095] A measure of the “deficiency” d_(i) of interpolating to pixel iwith the interpolation weights, see equation 14, is defined as therelative part of equation 17 being ignored by the relation, see equation14, i.e.,

d _(i)=Σ_(j∉C) â _(ij).

[0096] Similarly, given any interpolation weights {w_(ik)} withdeficiencies {d_(i)}, the improved interpolation weights {{overscore(w)}_(ik)} created by equation 18 will have the deficiencies

{overscore (d)} _(i)=Σ_(j∉C) â _(ij) d _(j).

[0097] Hence, with reasonably dense set C, the deficiencies will be muchreduced with each improvement, so that normally very few suchimprovements (if at all) would be needed.

[0098] With the improved interpolation weights, see equation 18, thecoarse-variable selection criterion, equation 11, can be relaxed,replacing it by the more general criterion d_(i)≦1−β. Condition coarseitem max dependence, equation 16, can similarly be relaxed.

[0099] Also, for computational efficiency it is desired to keep theinterpolation matrix {w_(ik)} as sparse (containing as few non-zeroterms) as possible. This is accomplished by replacing small weights(w_(ik)<ξ, ξ, being another algorithm-control parameter; e.g., ξ=0.01)by zeros, and then renormalize to maintain

Σ_(k) w _(ik)=1.

[0100] Now the second specific embodiment will be described in detail.As explained above, image segmentation is difficult because objects maydiffer from their background by any of a variety of properties that canbe observed in some, but often not all scales. A further complication isthat coarse measurements, applied to the image for detecting theseproperties, often average over properties of neighboring segments,making it difficult to separate the segments and to reliably detecttheir boundaries. The method and apparatus for segmentation generatesand combines multiscale measurements of intensity contrast, texturedifferences, and boundary integrity. The method is based on the methodand apparatus described above utilizing an algorithm for segmentweighted aggregation (SWA), which efficiently detects segments thatoptimize a normalized-cut-like measure by recursively coarsening a graphreflecting similarities between intensities of neighboring pixels. Inthis invention aggregates of pixels of increasing size are graduallycollected to form segments. While doing so, properties of the aggregatesare computed and the graph is modified to reflect these coarse scalemeasurements. This allows detecting regions that differ by fine as wellas coarse properties, and to accurately locate their boundaries.Furthermore, by combining intensity differences with measures ofboundary integrity across neighboring aggregates regions separated byweak, yet consistent edges can be detected.

[0101] As noted above, image segmentation methods divide the image intoregions of coherent properties in an attempt to identify objects andtheir parts without the use of a model of the objects. In spite of manythoughtful attempts, finding a method that can produce satisfactorysegments in a large variety of natural images has remained difficult. Inpart, this may be due to the complexity of images. Regions of interestmay differ from surrounding regions by any of a variety of properties,and these differences can be observed in some, but often not in allscales. In the following, an improved method and apparatus forsegmentation utilizing an improved algorithm will be described. Thisimprovement is based on the foregoing disclosure regarding segmentation.Like the foregoing, in the improved invention a multiscale structure isbuilt to measure and incorporate various properties such as intensitycontrast, isotropic texture, and boundary integrity. This results in anefficient method and apparatus that detects useful segments in a largevariety of natural images.

[0102] Segments that differ by coarse scale properties introduce aspecific difficulty to the segmentation process. Since initially onedoes not know the division of the image into segments, any coarsemeasurement must rely on an arbitrarily chosen set of pixels (“support”)that may often include pixels from two or more segments, particularlynear the boundaries of segments. This may lead to significantover-smoothing of the measured properties and to blurring the contrastbetween segments, inevitably leading to inaccuracies in the segmentationprocess. On the other hand, since segments often differ only by coarsescale properties, such segments cannot be detected unless coarsemeasurements are made.

[0103] The improved method and apparatus solve this “chicken and eggproblem.” It does so by building a pyramid structure over the image. Thestructure of the pyramid is determined by the content of the image. Asthe pyramid is constructed from bottom to top, segment fragments ofincreasing size are detected. These fragments are used as a support areafor measuring coarse scale properties. The new properties are then usedto further influence the construction of larger fragments (andeventually whole segments). By measuring properties over fragments, thisavoids the over-smoothing of coarse measurements (as can be seen in FIG.5, and so segments that differ in coarse scale properties usually standout. Experiments demonstrate a considerable improvement over existingapproaches. The process is very efficient. The runtime complexity of thealgorithm is linear in the size of the image. As the pyramid isconstructed segment properties are computed recursively throughintegrals over the support area. The implementation (whose run-time maystill be significantly reduced) applied to an image of 200×200 pixelstakes about 5 seconds on a Pentium III laptop.

[0104] The method uses the segmentation by weighted aggregation (SWA) asa framework. This algorithm uses techniques from algebraic multigrid tofind the segments in the image. Like other recent segmentationalgorithms, the method optimizes a global measure, a normalized-cut typefunction, to evaluate the saliency of a segment. To optimize themeasure, the algorithm builds an irregular pyramid structure over theimage. The pyramids maintains fuzzy relations between nodes insuccessive levels. These fuzzy relations allow the algorithm to avoidlocal decisions and detect segments based on a global saliency measure.

[0105] Pyramids constructed over the image have been used for solvingmany problems in computer vision. Typically, the image is sampled atvarious scales and a bank of filters is applied to each sampling.Segmentation is generally obtained by going down the pyramid andperforming split operations. Subsequent merge operations are applied toreduce the effect of over-segmentation. Over-smoothing introduces aserious challenge to these methods. A different kind of pyramidstructure is built by agglomerative processes, however these processes,are subject to local, premature decisions.

[0106] Other approaches that attempt to reduce the problem ofover-smoothing include the use of directional smoothing operators formultiscale edge detection. These operators are local and require anadditional process for inferring global segments. Also of relevance aremethods for smoothing using unisotropic diffusion. These methods avoidover-smoothing, but typically involve a slow, iterative process that isusually performed in a single scale.

[0107] The foregoing reviewed the SWA Algorithm, and described how tocombine properties that involve integral measures over a segment in thesegmentation process. With two measures, the average intensity of asegment and its variance across scale was demonstrated. Also, in theforegoing boundaries in the segmentation process were discussed.Finally, in the experiments given above, it was shown that the inventionperformed efficiently.

[0108] As discussed above, in the efficient multiscale algorithm forimage segmentation, a 4-connected graph G=(V,E,W) is constructed fromthe image, where each node v_(i)εV represents a pixel, every edgee_(ij)εE connects a pair of neighboring pixels, and a weight w_(ij) isassociated with each edge reflecting the contrast in the correspondinglocation in the image. The algorithm detects the segments by finding thecuts that approximately minimize a normalized-cut-like measure. This isachieved through a recursive process of weighted aggregation, whichinduces a pyramid structure over the image. This pyramid structure isused in this invention to define geometric support for coarse scalemeasurements, which are then used to facilitate the segmentationprocess.

[0109] Two slight modifications (improvements) are made to the SWAalgorithm relative to its original implementation as described above.First, the normalization term in the optimization measure is changed,normalizing the cost of a cut by the sum of the internal weights of asegment, rather than by its area. Secondly, the graph is initializedwith weights that directly reflect the intensity difference betweenneighboring pixels, rather than using “edgeness measure” definedpreviously. Specifically, w_(ij)=exp(−α|I_(i)-I_(j)|), where I_(i) andI_(j) denote the intensities in two neighboring pixels i and j, and α>0is a constant.

[0110] The SWA algorithm proceeds as follows. With every segment S={s₁,s₂, . . . , s_(m)}⊂V, a state vector u=(u₁, u₂, . . . , u_(n)) (n=∥V∥),is associated where $\begin{matrix}{u_{1} = \left\{ {\begin{matrix}1 & {{{if}\quad i} \in S} \\0 & {{{if}\quad i} \notin S}\end{matrix}.} \right.} & (1)\end{matrix}$

[0111] The cut associated with S is defined to be $\begin{matrix}{{{E(S)} = {\sum\limits_{i \neq j}^{\quad}{w_{ij}\left( {u_{i} - u_{j}} \right)}^{2}}},} & (2)\end{matrix}$

[0112] and the internal weights are defined by

N(S)=Σw _(ij) u _(i) u _(j).  (3)

[0113] The segments that yield small (and locally minimal) values forthe functional

Γ(S)=E(S)/N ^(β)(S),  (4)

[0114] for a predetermined constant β>0, and whose volume is less thanhalf the size of the image, are considered salient.

[0115] The objective of the algorithm is to find the salient segments.To this end a fast transformation for coarsening the graph wasintroduced. This transformation produces a coarser graph with about halfthe number of nodes (variables), and such that salient segments in thecoarse graph can be used to compute salient segments in the fine graphusing local processing only. This coarsening process is repeatedrecursively to produce a full pyramid structure. The salient segmentsemerge in this process as they are represented by single nodes at somelevel in the pyramid. The support of each segment can then be deduced byprojecting the state vector of the segment to the finest level. Thissegmentation process is run in time that is linear in the number ofpixels in the image.

[0116] The coarsening procedure proceeds recursively as follows. Itbegins with G^([0])=G. (The superscript denotes the level of scale.)Given a graph G^([s−1]), a set of coarse representative nodesC⊂V^([s−1])={1, 2, . . . , n} is chosen, so that every node inV^([s−1])\C is strongly connected to C. A node is considered stronglyconnected to C if the sum of its weights to nodes in C is a significantproportion of its weights to nodes outside C. Assume, without loss ofgenerality, that C={1, 2, . . . , N}. A coarser state vector u^([s])={u₁^([s]), u₂ ^([s]), . . . , u_(N) ^([s])} is now associated with C, sothat u_(k) ^([s]) denotes the state of the node k. Because the originalgraph is local, and because every node is strongly connected to C, thereexists a sparse interpolation matrix p^([s−1,s]), withE_(k)P_(ik)^([s − 1, s]) = 1

[0117] for every i, that satisfies the following condition. Givenu^([s]) for any salient segment S, the state vector u^([s−1]) associatedwith that segment is approximated well by the inter-scale interpolation

u ^([s−1]) ≅P ^([s−1,s]) u ^([s]).  (5)

[0118] {p_(ik) ^([s−1,s])}_(k=1) ^(N) are chosen to be proportional to{w_(ik)^((m))}_(k = 1)^(N)

[0119] for any i∉C, and p_(ii)^([s − 1, s]) = 1

[0120] for iεC.

[0121] Every node kεC can be thought of as representing an aggregate ofpixels. For s=1, for example, a pixel i belongs to the k-th aggregatewith weight p_(ik)^([0, 1]).

[0122] Hence, a decomposition of the image into aggregates can beobtained. Note that by the definition of P^([s−1,s]) aggregates willgenerally not include pixels from both sides of a sharp transition inthe intensity. In the absence of such a sharp transition a pixel willtypically belong to several surrounding aggregates with weightsproportional to its coupling to the representative of each aggregate.

[0123] Only later, as information from much coarser levels joins in,will sharper membership of such pixels to aggregates or segmentsincreasingly emerge. This is unlike agglomerative techniques, wherepixels are definitely joined together based only on quite localinformation, which is often not reliable enough.

[0124] Equation 5 (Coarse-To-Fine) is used to generate a coarse graphG^([s])=(V^([s]), E^([s]), W^([s])), which is associated with the statevector u^([s]), where V^([s]) corresponds to the set of aggregates (1,2, . . . , N), and the weights in W^([s]) are given by the “weightedaggregation” relation $\begin{matrix}{{w_{kl}^{\lbrack s\rbrack} = {{\sum\limits_{i \neq j}^{\quad}{P_{ik}^{\lbrack{{s - 1},s}\rbrack}w_{ij}^{\lbrack{s - 1}\rbrack}P_{jl}^{\lbrack{{s - 1},s}\rbrack}}} + {\delta_{kl}{\sum\limits_{i}^{\quad}{P_{ik}^{\lbrack{s - 1}\rbrack}w_{ii}^{\lbrack{s - 1}\rbrack}}}}}},} & (6)\end{matrix}$

[0125] where δ_(kl) is the Kronecker delta. (The second term in thisexpression influences only the computation of the internal weight of anaggregate, and its role is to recursively accumulate those weights.)Finally, we define an edge e_(kl)^([s]) ∈ E^([s])

[0126] if and only if k≠l and w_(k) ^([s])≠0. The coarse graph G^([s])can be partitioned according to relations like equations 1-4 applied tothe coarse state vector u^([s]), except that the internal weight,equation 3, should now take into account also the internal weights ofthe aggregates; so that w_(kk) is generally not zero, and its value canbe computed recursively using equation 6.

[0127] At the end of this process a full pyramid has been constructed.Every salient segment appears as an aggregate in some level of thepyramid. Therefore evaluation of the saliency of every node occurs, andthen a top-down process applied to determine the location in the imageof the salient ones. This is achieved for every segment by interpolatingu^([s]) from the level at which the segment was detected downward to thefinest, pixel level using equation 5. Sharpening sweeps are appliedafter interpolation at each level to determine the boundaries of thesegment more accurately as described below.

[0128] The objective now is to use the pyramidal structure created bythe SWA algorithm in order to define the support regions for coarsemeasurements and to use these measurements to affect the constructedpyramid. There are two versions to this algorithm. In one version thecoarse measurements affect only the construction of yet coarser levelsin the pyramid. In the second version we also use the coarsemeasurements are used to also affect lower levels of the pyramid. Thiswill be explained subsequently. Without top-down processing thealgorithm proceeds as follows. At each step is constructed a new levelin the pyramid according to the process described previously. For everynode in this new level then compute properties of the, aggregate itrepresents. Finally, for every edge in this graph update the weights toaccount for the properties measured in this level. This will affect theconstruction of the pyramid in the next levels.

[0129] Certain useful properties of regions are expressed throughintegrals over the regions. Such are, for example, statistics over theregions. Such properties are also easy to handle with the novelalgorithm, since they can be computed recursively with the constructionof the pyramid. With such measurements the overall linear complexity ofthe segmentation process is maintained. Below are examples of two suchmeasures. The first measure is the average intensity of the regions. Theaverage intensity allows segmenting regions whose boundaries arecharacterized by a gradual transition of the intensity level. Thismeasure was discussed above The second, more elaborate measure is thevariances of regions. The variance of regions are collected in everyscale with comparing of the set of variances obtained for neighboringaggregates. This allows accounting for isotropic textures characterizedby their second order statistics. As will be evident from experiments,this already allows handling pictures that include textured segments.The full future treatment of texture will require additionalmeasurements that are sensitive to directional texture and perhaps tohigher order statistics of segments.

[0130] Below is described the two measurements and their use in thesegmentation process. First, some notation. The matrix $\begin{matrix}{P^{\lbrack{t,s}\rbrack} = {\prod\limits_{q = t}^{s - 1}\quad P^{\lbrack{q,{q + 1}}\rbrack}}} & (7)\end{matrix}$

[0131] describes the interpolation relations between a scale t and ascale s, 0≦t<s. Thus, P_(ik)^([t, s])

[0132] measures the degree that the aggregate i of scale t belongs tothe aggregate k of scale s.

[0133] Suppose Q_(l) ^([t]) is an integral of a function over theaggregate l at scale t. Then, one can recursively compute the integralof that function over an aggregate k in any scale s>t by $\begin{matrix}{Q_{k}^{\lbrack{t,s}\rbrack} = {\sum\limits_{l}^{\quad}{P_{lk}^{\lbrack{t,s}\rbrack}{Q_{l}^{\lbrack t\rbrack}.}}}} & (8)\end{matrix}$

[0134] Using equation 7, this integral can be computed level by level bysetting t=s−1 in equation 8. The average of Q_(k) ^([t,s]) over theaggregate k can also be computed by $\begin{matrix}{{{\overset{\_}{Q}}_{k}^{\lbrack{t,s}\rbrack} = {Q_{k}^{\lbrack{t,s}\rbrack}/P_{k}^{\lbrack{t,s}\rbrack}}},} & (9)\end{matrix}$

[0135] where P_(k)^([t, s]) = ∑_(l)P_(lk)^([t, s]),

[0136] the volume of the aggregate k at scale s is given in units ofaggregates at scale t, can also be computed recursively. In particular,p_(k)^([0, s])

[0137] is the volume of the aggregate k at scale s in pixels.

[0138] Measuring the average intensity of a region is useful fordetecting regions whose intensity falls off gradually near the boundary,or when the boundary is noisy. Let μ_(k) ^([s]) denote the averageintensity of the aggregate k at scale s. Equations 8 and 9 can be usedto compute recursively μ_(k) ^([s]) starting with Q_(i) ^([0])=I_(i).

[0139] In the construction of a new level s the weights w_(k) ^([s]) aregenerated according to equation 6 using the fine-scale weights. Modifyw_(k) ^([s]) to account also for intensity contrast between theaggregates k and l by multiplying it by^(−α_(s)|μ_(k)^([s]) − μ_(l)^([s])|.)

[0140] This parameter, α_(s) can be tuned to prefer certain scales overothers, say, according to prior knowledge of the image. In theimplementation it is set

[0141] α_(s)≡{tilde over (α)}

[0142] for a fixed {tilde over (α)}>0. As a result of this modification,the subsequent construction of the coarser levels of the pyramid isaffected by the contrast at level s and at all finer scales. Thisenables detection of significant intensity transitions seen at any levelof scale.

[0143] The variance of image patches is a common statistical measureused to measure texture. Variance is useful in characterizing isotropicsurface elements. Additional statistics are often used to characterizemore elaborate textures. In the present invention, the average variancesat all finer scales is used to relate between aggregates. Otherstatistics can be incorporated as a step in the method in a similar way.

[0144] To compute the variance of an aggregate according to the presentinvention, the average squared intensity of any aggregate k at any scales is accumulated denoted by

[0145] {overscore (I²)}_(k) ^([s]).

[0146] This too is done recursively starting with Q_(i) ^([0])=I_(i) ².The variance of an aggregate k at a scale s is then given by

v _(k) ^([s]) ={overscore (I²)} _(k) ^([s])−(μ_(k) ^([s]))².

[0147] By itself, the variance of an aggregate measured with respect toits pixels provides only little information about texture. Additionalinformation characterizing the texture in an aggregate can be obtainedby measuring the average variance of its sub-aggregates. This is denotedby ${\overset{\_}{v}}_{k}^{\lbrack{t,s}\rbrack},$

[0148] the averages of v_(i) ^([t]) over all the sub-aggregates of k ofscale t(t<s). ${\overset{\_}{v}}_{k}^{\lbrack{t,s}\rbrack}$

[0149] can be compute recursively, beginning at scale t by setting Q_(l)^([t])=v_(l) ^([t]) in equation 8. The multiscale variance associatedwith an aggregate k in scale s, then, is described by the vector${\overset{\rightarrow}{v}}_{k}^{\lbrack{t,s}\rbrack} = {\left( {{\overset{\_}{v}}_{k}^{\lbrack{1,s}\rbrack},{\overset{\_}{v}}_{k}^{2,s},\quad \ldots \quad,{\overset{\_}{v}}_{k}^{\lbrack{{s - 1},s}\rbrack},v_{k}^{\lbrack s\rbrack}} \right).}$

[0150] In the construction of a level s in the pyramid the multiscalevariance vector is used to modify the weights in the graph G^([s]). Forevery pair of connected nodes k and l in V^([s]) w_(k  l)^([s])

[0151] is multiplied by ^(−β_(s)D_(k  l)^([s])),

[0152] where D_(k  l)^([s])

[0153] is the Mahalanobis distance between {overscore (v)}_(k) ^([s])and {overscore (v)}_(l) ^([s]), which can be set so as to prefer certainscales over others. In general, these modifications are performed onlyfrom a certain scale T and up. This enabled accumulation of aggregatesof sufficient size that contain rich textures.

[0154] The multiscale variance of an aggregate can detect isotropictexture. To account for non-isotropic texture one may aggregaterecursively the covariance matrix of each aggregate, and use it to inferits principal axis indicating its direction and oblongness. By computingstatistics of the direction and oblongness of sub-aggregates at everyfiner scale, a multiscale description can be obtained of the texturepattern in the aggregate.

[0155] Smooth continuation of boundaries is a strong cue that oftenindicates the presence of a single object on one side of the boundary.In the present invention this cue is used to facilitate the segmentationprocess. In this regard, the method proceeds as follows. During theconstruction of the pyramid, for every aggregate, sharp (as opposed toblurry) sections of its boundary are identified. Then, every twoneighboring aggregates are compared and determined whether they can beconnected with a smooth curve. If a clear smooth continuation is found,then, the weight between the aggregates is increased. Consequently, suchtwo aggregates are more likely to be merged in the next level of thepyramid even when there is variation in their intensities.

[0156] Identifying the boundaries of an aggregate requires top-downprocessing of the pyramid. At each level of the pyramid, for everyaggregate in that level, the sharp sections of its boundary aredetermined by looking at its sub-aggregates several levels down thepyramid. In general, the method goes down a constant or fixed number oflevels keeping the resolution of the boundaries proportional to the sizeof the aggregate. This will somewhat increase the total runtime of thealgorithm, but the asymptotic complexity will remain linear. The effortis worth the extra cost, since boundary cues can help avoiding theover-fragmentation of images, which is a common problem in manysegmentation algorithms.

[0157] The fact that the method only considers boundary completionbetween neighboring aggregates, allows consideration of only candidatesthat are likely to produce segment boundaries and cuts down thecombinatorics that stalls perceptual grouping algorithms. This has twoimportant consequences. First, it keeps the overall complexity of thesegmentation process low. Secondly, it eliminates candidates that mayproduce smooth continuations, but otherwise are inconsistent with thesegments in the image. This simplifies the decisions made by thesegmentation process and generally leads to more accurate segmentation.It should be noted, however, that the boundary process of the inventivemethod is intended to facilitate the segmentation process and not todeal with pictures that contain long subjective contours as mostperceptual grouping algorithms do.

[0158] Before explaining the details of how boundaries are used in thesegmentation method, one, important step in the extraction of boundarieswill be explained. This is a top-down step whose purpose is to make theboundaries of an aggregate in the pyramid sharper.

[0159] Every time a new level in the pyramid is constructed, also, aprocess of top-down sharpening of the boundaries of aggregates isperformed by readjusting the weights two levels down and then, updatingthe higher levels according to these adjustments. The reason for thisstep is as follows. Recall that every level of the pyramid isconstructed by choosing representative nodes from the previous levels.Thus, every aggregate in a level s is identified with a singlesub-aggregate in the preceding level s−1. This sub-aggregate belongs tothe coarser aggregate with interpolation weight 1 (see equation 5). Byrecursion, this means that the aggregate of level s is identified with asingle pixel in the image. As the pyramid is coarsened, this mayintroduce a bias since pixels in the aggregate that are far from therepresentative pixels may be weakly related to the aggregate merelybecause of their distance from the representative pixel. To remedy this,a top-down sharpening step or procedure is performed in which for everyaggregate, nodes are identified in the lower levels that clearly belongto the aggregate. Then, the interpolation weight for such nodes isincreased considerably. This results in extending the number of pixelsthat are fully identified with the segment, and as a consequence inrestricting the fuzzy transitions to the boundaries of a segment.

[0160] The step or procedure of sharpening is performed as follows.Consider an aggregate k at scale s. The aggregate is associated with thestate vector u^([s]) by assigning 1 at its k-th position and 0elsewhere. Equation 5 tells how each node of scale s−1 depends on k.Considering the obtained state vector u^([s−1]), define a modifiedvector ũ^([s−1]) is defined by $\begin{matrix}{{\overset{\sim}{u}}_{i}^{\lbrack{s - 1}\rbrack} = \left\{ {\begin{matrix}1 & {if} & {u_{i} > {1 - \delta_{2}}} \\u_{i} & {if} & {\delta_{1} \leq u_{i} \leq {1 - \delta_{2}}} \\0 & {if} & {u_{i} < \delta_{1}}\end{matrix}.} \right.} & (10)\end{matrix}$

[0161] for some choice of 0≦δ₁, δ₂≦1. One recommended choice is to useδ₁=δ₂=0.2. This process is repeated recursively using ũ^([s−1]) until ascale t is reached (typically t=s−2). Once t is reached, the obtainedstate vector ũ^([t]) is looked at. For every pair of nodes i and j atscale t for which ũ_(i) ^([t])=ũ_(j) ^([t])=1 the weight is doubledbetween the nodes. This will make those nodes belong to the aggregatemuch more strongly than the rest of the nodes. This step, procedure orprocess is repeated for every aggregate at scale s obtaining a newweight matrix W^([t]).

[0162] Using the new weight matrix W^([t]), the pyramid is rebuilt fromlevels t+1 and up. This in effect will modify, the Interpolationmatrices and the weights at the coarser levels. As a result a sharperdistinction is obtained between the aggregates, where coarse levelmeasurements affect our interpretation of the image in finer scales.

[0163] A similar mechanism was used in the procedures first describedabove as a post-processing stage to determine the boundaries of salientsegments in the image. In this improved method, this procedure isapplied throughout the bottom-up pyramid construction. As a consequencecoarse measurements influence the detection of segments already at finescales.

[0164] Next it is explained how boundaries facilitate the segmentationmethod. Given an aggregate k at scale s (denoted by S), this step beginsagain with the characteristic state vector u^([s]) by assigning 1 tou^([s]) at the k-th position and 0 elsewhere. The procedure described insharpening a constant number of levels l down the pyramid to obtain thecorresponding state vector ũ[s−l]. Since every variable u_(i) ^([s−l])is associated with a pixel in the image, this vector indicates which ofthe corresponding pixels belong to S and by what degree. Hence, there isa (non-uniform) sampling of image pixels ũ^([s−l]), with their degree ofbelonging to S. The scale s−l determines the density of the sampling.The lower this scale is (larger l), the smaller are the correspondingaggregates and, hence, the denser are the corresponding pixels in theimage. The state vector is treated as an image, with values assignedonly to the pixels ũ^([s−l]). Pixels with high values in ũ^([s−l]),belong to S, whereas pixels with low value belong outside S. Then, sharpboundaries in this image are sought, at the resolution imposed by thedensity of the pixels ũ^([s−l]), by looking for sharp transitions in thevalues of these pixels.

[0165] To locate the boundaries, for each pixel in ũ^([s−l]), itsdifference is measured from the average value of its neighbors (also inũ^([s−l])). That is, for example, applying a Marr and Hildreth-likeoperator to only the pixels of ũ^([s−l]). The resulting values arethreshholded to obtain edge pixels. This step is followed by a step ofedge tracing in which line segments are best fit (in the l₂-norm sense)to the edge pixels. Finally, a polygonal approximation is produced ofthe aggregate boundary. The line segments obtained in this process arein fact oriented vectors; we maintain a clockwise orientation ismaintained by keeping track of the direction to the inside of theaggregate. The size of the aggregate determines the density of the edgepixels. Note that the edges obtained may be fragmented; gaps may stillbe filled in at a coarser scale. The total complexity of the algorithmremains linear because during the process of boundary extraction descentonly occurs a constant number of levels and the number of pixelsaccessed falls down exponentially as one climbs higher in the pyramid.

[0166] Repeating for each aggregate k at scale s the top-downinterpolation process described above, there is no need in going all theway down to the finest scale. It is enough to go down the levels only aconstant number of scales l like described above, obtaining thecorresponding state vector ũ^([t]), t=s−l. Since every variable u_(i)^([t]) is associated with a pixel in the image a generalizedMarr-Hildreth-like operator may be applied to those pixels according totheir neighboring relations in the image. From this is derived apolygonal approximation to the aggregate's sharp boundary sections. Inthis top-down interpolation process every aggregate is described atscale t by a fixed number of pixels (set by l). Since the number ofaggregates is reduced with scale by roughly a constant factor, themethod remains with a linear complexity of the algorithm.

[0167] After the method determines the boundaries of each aggregate inthe level s, the method examines every two neighboring aggregates todetermine whether their boundaries form a smooth continuation. To thisend a step is employed to first sort the vectors obtained for eachaggregate according to their orientation (from 0 to 2π). Then, for everytwo aggregates the method can quickly identify pairs of vectors ofsimilar orientation by merging the two lists of sorted vectors. Themethod then matches two vectors if they satisfy three conditions: (1)they form a good continuation, (2) the two corresponding aggregates haverelatively similar properties (e.g., intensity and variance), and (3) noother vector of the same aggregate forms a better continuation. The lastcondition is used to eliminate accidental continuations that appear asY-junctions.} To evaluate whether two vectors form a good continuation,the improved method uses the measure proposed below.

[0168] In a specific example of the measure noted above, an elasticameasure is employed that penalizes deviations from smoothness, but takesinto account in addition the lengths of the two vectors. The measure isa product of two exponents, one that penalizes for the distance rbetween the two vectors, and the other that penalizes for the angulardifference between them. In these expressions the angles that the twovectors form with the line connecting their centers is taken intoconsideration, and the lengths and widths of the vectors. In the currentimplementation of the method only vectors of width 1 are considered, butthe width may be changed. Allowing for larger values will handle alsothick edges. This measure allows long vectors to connect over largergaps, but penalizes them more severely if they change theirorientations.

[0169] If a pair of vectors are found that form a good continuationaccording to this measure, the weight is increased between the twoaggregates to become equal to their highest weight to any other node.This way the system is encouraged to merge the two aggregates in thenext level of the pyramid. One can of course consider to moderate thisintervention in the process to balance differently between boundaryintegrity and other properties.

[0170] As with the variances these steps are applied from some level Tand up, so that the aggregates are large enough to form significantlylong boundaries.

[0171] The process of boundary completion can be extended to detecta-modal completions. To this end remote aggregates may be compared andthe weights between them be increased if (1) after completion theirboundaries can be connected smoothly, and (2) they are separated by asalient (foreground) segment. This step is referred to as “topologicalsubtraction of detected foreground segments.”

[0172] The method and apparatus of the present invention has been testedon various natural images. Shown in FIGS. 5-12 are a few examples. Toget a sense of the advantages of the inventive method two otheralgorithms were applied to the same images for comparison purposes.First, the SWA algorithm described above was tested to demonstrate theenhancements achieved by incorporating additional coarse measurements.This implementation used a 4-connected graph, but incorporates onlycoarse scale average-intensity measures, in a manner similar to thatdescribed regarding average Intensity. In addition, an implementation ofthe known normalized cuts algorithm was tested, In this implementation agraph is used in which every node is connected to nodes up to a radiusof 30 pixels. The algorithm combines intensity, texture and contourmeasurements. In both cases original software written for thesealgorithms was used. Due to the large number of parameters, the testswere limited regarding the variety of settings. However, it is believedthat the results obtained in all cases are fairly typical to theperformance that can be obtained with the algorithms.

[0173]FIGS. 5a to d contrasts the effect of averaging over aggregateswith the effect of “geometric” averaging that ignores the regions in theimage. In the middle row the image is tiled with 10×10 pixel squares,and each pixel is assigned with the average intensity of the square towhich it belongs. In the bottom row every pixel is assigned with theaverage intensity of the aggregate it belongs to, FIG. 5a. To constructthis image the method used aggregates of level 6. All together therewere 414 aggregates of approximately 73 pixels each. FIG. 5b shows“reconstructions” of the original image through interpolation. Noticethat averaging over squares leads to a blurry image, whereas inventivemethod preserves the discontinuities in the image. In particular noticehow with the geometric averaging the horse's belly blends smoothly intothe grass, whereas with the inventive method it is separated from thegrass by a clear edge, see FIGS. 5a to 5 d.

[0174]FIGS. 6a to 6 d to FIGS. 9a to 9 c show four images of animals invarious backgrounds. The FIGS. 6-9 show the results of running the threealgorithms on these images. Segmentation results are shown as graycolor-thick contours-color overlays on top of the original gray scaleimages. In all four figures the algorithm of the inventive methodmanages to detect the animal as a salient region. In FIGS. 8a to 8 dshowing a tiger, in particular, the inventive method manages to separatethe tiger from the background although it does not use any measure oforiented texture. In addition, the second, finer level the inventivemethod also separates the bush from the water. The SWA algorithm barelyfinds the horse in FIG. 6 and has some “bleeding” problems with thetiger. This is probably because the algorithm does not incorporate atexture measure in the segmentation process. The normalized cutsalgorithm yields significant over-fragmentation of the animals, andparts of the animals often merge with the background. This is typical inmany existing known segmentation algorithms. Another example is shown inFIGS. 10a and 10 b showing a village against a background of a mountainand a lake. Notice that the village, the mountain, and the lake areseparated by the inventive method.

[0175] The experiments were concludes with two famous examples ofcamouflage images. FIGS. 11a to 11 c show a squirrel climbing a tree.The inventive method including the novel algorithm finds the squirreland its tail as the two most salient segments. The tree trunk isover-segmented, possibly due to the lack of use of oriented texturecues. The normalized cuts algorithm, for comparison, shows significantamounts of “bleeding.” Finally, in FIGS. 12a to 12 b a Dalmatian isshown sniffing the ground against a black and white setting. Theinventive method using the novel algorithm extracts the head and bellyof the Dalmatian dog, and most of its body is detected with some“bleeding.” Such a segmentation can perhaps be used as a precursor forattention in this particularly challenging image.

[0176] The normalized cuts technique was significantly slower than theother two methods, SWA and the improved method. Running the normalizedcuts method on a 200×200 pixel image using a dual processor Pentium III1000 MHz took 10-15 minutes. The implementation of the inventive methodapplied to an image of the same size took about 5 seconds on a PentiumIII 750 MHz laptop.

[0177] The inventive method and apparatus employs a segmentationalgorithm that incorporates different properties at different levels ofscale. The algorithm avoids the over-averaging of coarse measurements,which is typical in many multiscale methods, by measuring propertiessimultaneously with the segmentation process. For this process it usesthe irregular pyramid proposed in the first embodiment to approximategraph-cut algorithms. The process of building the pyramid is efficient,and the measurement of properties at different scales integrates withthe process with almost no additional cost. The algorithm isdemonstrated by applying it to several natural images and comparing itto other, state-of-the-art algorithms. Experiments show that theinventive method and novel algorithm achieves dramatic improvement inthe quality of the segmentation relative to the tested methods. Thealgorithm can further be improved by incorporating additional measures,e.g., of oriented texture. Moreover, the multiscale representation ofthe image obtained with the pyramid can be used to facilitate high levelprocesses such as object recognition.

[0178] In FIG. 13 there is shown a flow chart in generalized form thatshows the general steps of the method, all of which have been elaboratedin detail in the foregoing description of the detailed specificembodiments. As shown, in step S1 a graph is constructed from an image.In step S2 some pixels of the image have been selected as block pixelswith unselected pixels coupled to a block pixel, i.e. neighboring pixelas described in the foregoing. This establishes aggregates. Next, instep S3 the graph is coarsened through a procedure of iterative weightedaggregation in order to obtain a hierarchical decomposition of the imageand to form a pyramid over the image. During the iterations, asdescribed in detail in the foregoing, the aggregates are agglomeratedinto larger aggregates according to the rules provided and as detailedin the foregoing. In this fashion it becomes possible to identifysegments in the pyramid in step S4. Then, in step S5 the saliency ofsegments in the pyramid are detected, and in step S6 the segments aresharpened to determine their boundaries more accurately. This is ageneralized description of the method, and as will be evident from theforegoing, there is considerable detail concerning each step, aselaborated in the foregoing.

[0179] The present invention (i.e., system described in detail in thisdescription of specific embodiments and as generally depicted in FIG. 13or any part thereof may be implemented using hardware, software or acombination thereof and may be implemented in one or more computersystems or other processing systems, and the capability would be withinthe skill of one ordinarily skilled in the art of programming ofcomputers from the teachings and detailed disclosure provided in theforegoing. In fact, an example of a computer system 500 is shown in FIG.14. The computer system 500 represents any single or multi-processorcomputer. In conjunction, single-threaded and multi-threadedapplications can be used. Unified or distributed memory systems can beused. Computer system 500, or portions thereof, may be used to implementthe present invention. For example, the system 100 of the presentinvention may comprise software running on a computer system such ascomputer system 500.

[0180] In one example, the system and method of the present invention isimplemented in a multi-platform (platform independent) programminglanguage such as Java, programming language/structured query language(PL/SQL), hyper-text mark-up language (HTML), practical extractionreport language (PERL), Flash programming language, common gatewayinterface/structured query language (CGI/SQL) or the like. Java-enabledand JavaScript-enabled browsers are used, such as, Netscape, HotJava,and Microsoft Explorer browsers. Active content web pages can be used.Such active content web pages can include Java applets or ActiveXcontrols, or any other active content technology developed now or in thefuture. The present invention, however, is not intended to be limited toJava, JavaScript, or their enabled browsers, and can be implemented inany programming language and browser, developed now or in the future, aswould be apparent to a person skilled in the relevant art(s) given thisdescription.

[0181] In another example, system and method of the present invention,may be implemented using a high-level programming language (e.g., C++)and applications written for the Microsoft Windows NT or SUN OSenvironments. It will be apparent to persons skilled in the relevantart(s) how to implement the invention in alternative embodiments fromthe teachings herein.

[0182] Computer system 500 includes one or more processors, such asprocessor 544. One or more processors 544 can execute softwareimplementing the routines described above, such as shown in FIG. 13 anddescribed in the foregoing. Each processor 544 is connected to acommunication infrastructure 542 (e.g., a communications bus, cross-bar,or network). Various software embodiments are described in terms of thisexemplary computer system. After reading this description, it willbecome apparent to a person skilled in the relevant art how to implementthe invention using other computer systems and/or computerarchitectures.

[0183] Computer system 500 can include a display interface 502 thatforwards graphics, text, and other data from the communicationinfrastructure 542 (or from a frame buffer not shown) for display on thedisplay unit 530.

[0184] Computer system 500 also includes a main memory 546, preferablyrandom access memory (RAM), and can also include a secondary memory 548.The secondary memory 548 can include, for example, a hard disk drive 550and/or a removable storage drive 552, representing a floppy disk drive,a magnetic tape drive, an optical disk drive, etc. The removable storagedrive 552 reads from and/or writes to a removable storage unit 554 in awell known manner. Removable storage unit 554 represents a floppy disk,magnetic tape, optical disk, etc., which is read by and written to byremovable storage drive 552. As will be appreciated, the removablestorage unit 554 includes a computer usable storage medium having storedtherein computer software and/or data.

[0185] In alternative embodiments, secondary memory 548 may includeother similar means for allowing computer programs or other instructionsto be loaded into computer system 500. Such means can include, forexample, a removable storage unit 562 and an interface 560. Examples caninclude a program cartridge and cartridge interface (such as that foundin video game console devices), a removable memory chip (such as anEPROM, or PROM) and associated socket, and other removable storage units562 and interfaces 560 which allow software and data to be transferredfrom the removable storage unit 562 to computer system 500.

[0186] Computer system 500 can also include a communications interface564. Communications interface 564 allows software and data to betransferred between computer system 500 and external devices viacommunications path 566. Examples of communications interface 564 caninclude a modem, a network interface (such as Ethernet card), acommunications port, interfaces described above, etc. Software and datatransferred via communications interface 564 are in the form of signalsthat can be electronic, electromagnetic, optical or other signalscapable of being received by communications interface 564, viacommunications path 566. Note that communications interface 564 providesa means by which computer system 500 can interface to a network such asthe Internet.

[0187] The present invention can be implemented using software running(that is, executing) in an environment similar to that described abovewith respect to FIG. 13 and in the foregoing. In this document, the term“computer program product” is used to generally refer to removablestorage unit 554, a hard disk installed in hard disk drive 550, or acarrier wave carrying software over a communication path 566 (wirelesslink or cable) to communication interface 564. A computer useable mediumcan include magnetic media, optical media, or other recordable media, ormedia that transmits a carrier wave or other signal. These computerprogram products are means for providing software to computer system500.

[0188] Computer programs (also called computer control logic) are storedin main memory 546 and/or secondary memory 548. Computer programs canalso be received via communications interface 564. Such computerprograms, when executed, enable the computer system 500 to perform thefeatures of the present invention as discussed herein. In particular,the computer programs, when executed, enable the processor 544 toperform features of the present invention. Accordingly, such computerprograms represent controllers of the computer system 500.

[0189] The present invention can be implemented as control logic insoftware, firmware, hardware or any combination thereof. In anembodiment where the invention is implemented using software, thesoftware may be stored in a computer program product and loaded intocomputer system 500 using removable storage drive 552, hard disk drive550, or interface 560. Alternatively, the computer program product maybe downloaded to computer system 500 over communications path 566. Thecontrol logic (software), when executed by the one or more processors544, causes the processor(s) 544 to perform functions of the inventionas described herein.

[0190] In another embodiment, the invention is implemented primarily infirmware and/or hardware using, for example, hardware components such asapplication specific integrated circuits (ASICs). Implementation of ahardware state machine so as to perform the functions described hereinwill be apparent to persons skilled in the relevant art(s) from theteachings herein.

[0191] As has been noted above, the present invention could be producedin hardware or software, or in a combination of hardware and software,and these implementations would be known to one of ordinary skill in theart. The system, or method, according to the inventive principles asdisclosed in connection with the preferred embodiments, may be producedin a single computer system having separate elements or means forperforming the individual functions or steps described or claimed or oneor more elements or means combining the performance of any of thefunctions or steps disclosed or claimed, or may be arranged in adistributed computer system, interconnected by any suitable means as alocal area network (LAN) or widely distributed network (WAN) over atelecommunications system (such as the Internet) as would be known to aperson of ordinary skill in the art.

[0192] Although the invention has been described in specificembodiments, changes and modification are possible that do not departfrom the teachings herein. Such changes and modifications as areapparent to one skilled in this art are deemed to fall within thepurview of the claims.

What is claimed is:
 1. A method for clustering data comprising the stepsof a. constructing a graph of a database in which each node of the graphrepresents a component part of the database, and every two nodesrepresent neighboring component parts associated by an arc representinga coupling value, b. selecting chosen component parts as blocks withunselected neighboring component parts coupled with a selected blockaccording to coupling values, c. coarsening the graph by performingiterated weighted aggregation wherein at each iteration of thecoarsening blocks are selected and coupling values updated betweenunselected blocks to form larger blocks to obtain hierarchicaldecomposition of the database and to form a pyramid structure, d.adjusting the coupling between blocks. e. detecting saliency of segmentsin the pyramidal structure, f. determining which component parts belongto a segment by computing recursively a degree of attachment of everycomponent part to each of the blocks in the pyramid, and g. scanning thepyramid from coarse to fine starting at the level a segment is detectedand applying relaxation sweeps to sharpen the boundaries of a segment.2. A method for processing an image comprising the steps of a.constructing a graph of an image in which each node of the graphrepresents a pixel of the image, and every two nodes representneighboring pixels associated by an arc representing a coupling value,b. selecting chosen pixels as blocks with unselected neighboring pixelscoupled with a selected block according to coupling values, c.coarsening the graph by performing iterated weighted aggregation whereinat each iteration of the coarsening blocks are selected and couplingvalues updated between unselected blocks to form larger blocks to obtainhierarchical decomposition of the database and to form a pyramidstructure, d. adjusting the coupling between blocks. e. detectingsaliency of segments in the pyramidal structure, f. determining whichpixels belong to a segment by computing recursively a degree ofattachment of every pixel to each of the blocks in the pyramid, and g.scanning the pyramid from coarse to fine starting at the level a segmentis detected and applying relaxation sweeps to sharpen the boundaries ofa segment.
 3. A method for processing an image comprising the steps ofa. constructing a graph of an image in which each node of the graphrepresents a pixel of the image, every edge connects a pair ofneighboring pixels and a weight is associated with each edge reflectingcontrast in the corresponding location in the image, b. selecting somepixels as blocks and associating unselected neighboring pixels with aselected block to form aggregates, c. detecting segments by a recursivecoarsening using weighted aggregation which induces a pyramid structureover the image, the segments detected appearing as an aggregate at somelevel in the pyramid, d. said recursive coarsening comprising iteratedweighted aggregation wherein at each iteration of the coarsening blocksare selected and weights are updated between unselected blocks to formlarger blocks to obtain hierarchical decomposition of the image intoaggregates, e. determining salient segments from among the segmentsdetected in the pyramidal structure, and f. sharpening the segments todetermine its boundaries more accurately.
 4. The method according toclaim 3 including the further step of determining which pixels belong toa segment by computing recursively a degree of attachment of every pixelto each of the blocks in the pyramid.
 5. The method according to claim 3including the further step of scanning the pyramid from coarse to finestarting at the level a segment is detected and applying relaxationsweeps to sharpen the boundaries of a segment.
 6. The method accordingto claim 3 including the further step of computing at least one propertyof the aggregate it represents during recursive coarsening for everyblock in a new level of the pyramid.
 7. The method according to claim 6including the further step of updating weights to account for propertiescomputed during recursive coarsening for every edge in the graph.
 8. Themethod according to claim 3 including the further step of updatingweights during cursive coarsening to increase weights betweenneighboring aggregates exhibiting sharp sections that connect by asmooth curve.
 9. The method according to claim 3 including the furtherstep of applying a top-down sharpening during the recursive coarseningat any given level by first going down a preselected number of levels tocheck the boundaries of detected segments, updating weights, andrebuilding the pyramid to the originating level to sharpen distinctionbetween aggregates before building the pyramid to the next upper level.10. The method according to claim 3 including the further step of goingback down a preselected number of levels to check sub-aggregatesregarding boundaries, update and rebuild the pyramid before proceedingto the next upper level as part of each iteration of weight aggregation.11. The method according to claim 3 including the further step ofdetecting sharp transitions in pixels in the image.
 12. The methodaccording to claim 3 including the further step of establishing athreshold to determine edge pixels in the image.
 13. The methodaccording to claim 12 including the further step of applying edgetracing by best fitting line segments of aggregates to determined edgepixels.
 14. The method according to claim 13 including the further stepof producing a polygonal approximation of an aggregate's boundary. 15.The method according to claim 3 including the further step of comparingthe properties of neighboring aggregates.
 16. The method according toclaim 3 including the further step of modifying weights to controlintensity contrast between aggregates during recursive coarsening. 17.The method according to claim 3 including the further step ofdetermining variance of an aggregate relative to a neighboringaggregate.
 18. The method according to claim 3 including the furtherstep of determining multiscale variance of an aggregate to detect itstexture.
 19. The method according to claim 3 including the further stepof determining average variance at finer scales to determine arelationship between aggregates.
 20. Apparatus for clustering datacomprising a computer processor programmed a. for constructing a graphof a database in which each node of the graph represents a componentpart of the database, and every two nodes represent neighboringcomponent parts associated by an arc representing a coupling value, b.to select chosen component parts as blocks with unselected neighboringcomponent parts coupled with a selected block according to couplingvalues, c. to coarsen the graph by performing iterated weightedaggregation wherein at each iteration of the coarsening blocks areselected and coupling values updated between unselected blocks to formlarger blocks to obtain hierarchical decomposition of the database andto form a pyramid structure, d. to adjust the coupling between blocks.e. to detect saliency of segments in the pyramidal structure, f. todetermine which component parts belong to a segment by computingrecursively a degree of attachment of every component part to each ofthe blocks in the pyramid, and g. for scanning the pyramid from coarseto fine starting at the level a segment is detected and applyingrelaxation sweeps to sharpen the boundaries of a segment.
 21. Apparatusfor processing an image comprising a computer processor programmed a.for constructing a graph of an image in which each node of the graphrepresents a pixel of the image, and every two nodes representneighboring pixels associated by an arc representing a coupling value,b. for selecting chosen pixels as blocks with unselected neighboringpixels coupled with a selected block according to coupling values, c.for coarsening the graph by performing iterated weighted aggregationwherein at each iteration of the coarsening blocks are selected andcoupling values updated between unselected blocks to form larger blocksto obtain hierarchical decomposition of the database and to form apyramid structure, d. for adjusting the coupling between blocks. e. fordetecting saliency of segments in the pyramidal structure, f. fordetermining which pixels belong to a segment by computing recursively adegree of attachment of every pixel to each of the blocks in thepyramid, and g. for scanning the pyramid from coarse to fine starting atthe level a segment is detected and applying relaxation sweeps tosharpen the boundaries of a segment.
 22. Apparatus for processing animage comprising a computer processor programmed a. for constructing agraph of an image in which each node of the graph represents a pixel ofthe image, every edge connects a pair of neighboring pixels and a weightis associated with each edge reflecting contrast in the correspondinglocation in the image, b. for selecting some pixels as blocks andassociating unselected neighboring pixels with a selected block to formaggregates, c. for detecting segments by a recursive coarsening usingweighted aggregation which induces a pyramid structure over the image,the segments detected appearing as an aggregate at some level in thepyramid, d. said recursive coarsening comprising iterated weightedaggregation wherein at each iteration of the coarsening blocks areselected and weights are updated between unselected blocks to formlarger blocks to obtain hierarchical decomposition of the image intoaggregates, e. for determining salient segments from among the segmentsdetected in the pyramidal structure, and f. for sharpening the segmentsto determine its boundaries more accurately.
 23. Apparatus according toclaim 22 wherein the computer processor is further programmed fordetermining which pixels belong to a segment by computing recursively adegree of attachment of every pixel to each of the blocks in thepyramid.
 24. Apparatus according to claim 22 wherein the computerprocessor is further programmed for scanning the pyramid from coarse tofine starting at the level a segment is detected and applying relaxationsweeps to sharpen the boundaries of a segment.
 25. Apparatus accordingto claim 22 wherein the computer processor is further programmed forcomputing at least one property of the aggregate it represents duringrecursive coarsening for every block in a new level of the pyramid. 26.Apparatus according to claim 25 wherein the computer processor isfurther programmed for updating weights to account for propertiescomputed during recursive coarsening for every edge in the graph. 27.Apparatus according to claim 22 wherein the computer processor isfurther programmed for updating weights during cursive coarsening toincrease weights between neighboring aggregates exhibiting sharpsections that connect by a smooth curve.
 28. Apparatus according toclaim 22 wherein the computer processor is further programmed forapplying a top-down sharpening during the recursive coarsening at anygiven level by first going down a preselected number of levels to checkthe boundaries of detected segments, updating weights, and rebuildingthe pyramid to the originating level to sharpen distinction betweenaggregates before building the pyramid to the next upper level. 29.Apparatus according to claim 22 wherein the computer processor isfurther programmed for going back down a preselected number of levels tocheck sub-aggregates regarding boundaries, update and rebuild thepyramid before proceeding to the next upper level as part of eachiteration of weight aggregation.
 30. Apparatus according to claim 22wherein the computer processor is further programmed for detecting sharptransitions in pixels in the image.
 31. Apparatus according to claim 22wherein the computer processor is further programmed for establishing athreshold to determine edge pixels in the image.
 32. Apparatus accordingto claim 31 wherein the computer processor is further programmed forapplying edge tracing by best fitting line segments of aggregates todetermined edge pixels.
 33. Apparatus according to claim 32 wherein thecomputer processor is further programmed for producing a polygonalapproximation of an aggregate's boundary.
 34. Apparatus according toclaim 22 wherein the computer processor is further programmed forcomparing the properties of neighboring aggregates.
 35. Apparatusaccording to claim 22 wherein the computer processor is furtherprogrammed for modifying weights to control intensity contrast betweenaggregates during recursive coarsening.
 36. Apparatus according to claim22 wherein the computer processor is further programmed for determiningvariance of an aggregate relative to a neighboring aggregate. 37.Apparatus according to claim 22 wherein the computer processor isfurther programmed for determining multiscale variance of an aggregateto detect its texture.
 38. Apparatus according to claim 22 wherein thecomputer processor is further programmed for determining averagevariance at finer scales to determine a relationship between aggregates.